Optimal Multivariate 2-Microaggregation for Microdata Protection: A 2-Approximation

نویسندگان

  • Josep Domingo-Ferrer
  • Francesc Sebé
چکیده

Microaggregation is a special clustering problem where the goal is to cluster a set of points into groups of at least k points in such a way that groups are as homogeneous as possible. Microaggregation arises in connection with anonymization of statistical databases for privacy protection (k-anonymity), where points are assimilated to database records. A usual group homogeneity criterion is within-groups sum of squares minimization SSE. For multivariate points, optimal microaggregation, i.e. with minimum SSE, has been shown to be NP-hard. Recently, a polynomial-time O(k)-approximation heuristic has been proposed (previous heuristics in the literature offered no approximation bounds). The special case k = 2 (2-microaggregation) is interesting in privacy protection scenarios with neither internal intruders nor outliers, because information loss is lower: smaller groups imply smaller information loss. For 2-microaggregation the existing general approximation can only guarantee a 36-approximation. We give here a new polynomial-time heuristic whose SSE is at most twice the minimum SSE (2-approximation).

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A polynomial-time approximation to optimal multivariate microaggregation

Microaggregation is a family of methods for statistical disclosure control (SDC) of microdata (records on individuals and/or companies), that is, for masking microdata so that they can be released without disclosing private information on the underlying individuals. Microaggregation techniques are currently being used by many statistical agencies. The principle of microaggregation is to group o...

متن کامل

A Comparative Study on Microaggregation Techniques for Microdata Protection

Microaggregation is an efficient Statistical Disclosure Control (SDC) perturbative technique for microdata protection. It is a unified approach and naturally satisfies k-Anonymity without generalization or suppression of data. Various microaggregation techniques: fixed-size and data-oriented for univariate and multivariate data exists in the literature. These methods have been evaluated using t...

متن کامل

Beyond Multivariate Microaggregation for Large Record Anonymization

Microaggregation is one of the most commonly employed microdata protection methods. The basic idea of microaggregation is to anonymize data by aggregating original records into small groups of at least k elements and, therefore, preserving k-anonymity. Usually, in order to avoid information loss, when records are large, i.e., the number of attributes of the data set is large, this data set is s...

متن کامل

LHS-Based Hybrid Microdata vs Rank Swapping and Microaggregation for Numeric Microdata Protection

In previous work by Domingo-Ferrer et al., rank swapping and multivariate microaggregation has been identified as well-performing masking methods for microdata protection. Recently, Dandekar et al. proposed using synthetic microdata, as an option, in place of original data by using Latin hypercube sampling (LHS) technique. The LHS method focuses on mimicking univariate as well as multivariate s...

متن کامل

Improved Univariate Microaggregation for Integer Values

Privacy issues during data publishing is an increasing concern of involved entities. The problem is addressed in the field of statistical disclosure control with the aim of producing protected datasets that are also useful for interested end users such as government agencies and research communities. The problem of producing useful protected datasets is addressed in multiple computational priva...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2006